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SUMMARY 

A benchmark test program that measures supercomputer performance has been de- 
veloped for the use of the NAS (Numerical Aerodynamic Simulation) Projects Office at 
NASA Ames Research Center. This benchmark program is described in detail and the 
specific ground rules for running the program as a performance test are discussed. 
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INTRODUCTION 


A benchmark test program has been developed for use by the NAS program at NASA 
Ames Research Center to aid in the evaluation of supercomputer performance. This pro- 
gram consists of seven Fortran test kernels that perform calculations that are typical of 
Ames supercomputing. It is expected that the performance of a supercomputer system 
on this program will provide an accurate projection of the performance of the system on 
actual NAS program computer codes. This paper describes the test program in detail and 
lists the specific ground rules that have been established for running the program as a 
performance test. 


PROGRAM DESCRIPTION 

The NAS Kernel Benchmark Program consists of approximately 1000 lines of Fortran 
code, organized into seven separate tests. Each individual test consists of a loop that 
iteratively calls a certain subroutine. These subroutines were chosen after review of many 
of the calculations currently being performed on Ames supercomputers and by recommen- 
dations from a number of Ames scientists and programmers, particularly those working 
on computational fluid dynamics problems. In most cases, these subroutines have been 
extracted from actual programs currently in use, and they have been incorporated into 
the NAS Kernel Benchmark Program with only minor changes. Thus it is felt that these 
test kernels are a representative cross section of expected NAS program supercomputing, 
and the performance of a computer system (both its hardware and its Fortran compiler) 
on these tests should be a reliable predictor of the actual system performance on NAS 
user programs. 

The seven selected programs all emphasize the vector performance of a computer sys- 
tem. Almost all of the floating-point operations indicated in these Fortran subroutines are 
contained in loops that are computable by vector operations, provided that the Fortran 
compiler of the computer system being tested is sufficiently powerful in its vectorization 
analysis, and provided that the hardware design of the computer includes the necessary 
vector instructions. Most serious supercomputer programs currently in use at Ames are 
fairly highly vectorized, and it is expected that programs to be developed in the future 
will virtually all be designed to effectively use the vector processing capabilities of super- 
computers. Some programs that have substantial scalar processing will continue to be 
used, but it is expected that their numbers will decline as algorithms and codes that are 
more suitable for vector processing are developed. Another reason for emphasizing vector 
performance in these benchmark kernels is that it is not very meaningful to average, even 
in a harmonic average sense, the performance of a supercomputer on a scalar code with 
its performance on a vector code. 

This program not only tests the hardware execution speed of a computer, but it 
also tests the effectiveness of the Fortran compiler. It is clear that a phenomenally fast 
hardware design is worthless unless it is coupled with a Fortran compiler that can fully 
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utilize the advanced hardware design. Furthermore, it is becoming increasingly clear that 
vectorization and other optimizations must either be completely automatic or be very 
easy to direct. If effective utilization of a computer requires massive redesign of otherwise 
well-written, standard Fortran-77 code, or if a high level of performance is possible only by 
considerable human intervention, then the actual usable power of the computer is severely 
reduced. 

The seven test kernels of the NAS Kernel Benchmark Program have, for the most part, 
been developed quite recently. As a result, they represent Fortran programs that have been 
designed and written for modern vector computation, as opposed to the somewhat dated 
code that is used for other popular benchmark programs. It might be argued that there 
is some inherent bias in the test towards the Gray computers, since most of these kernels 
were written on a Cray X-MP. However, substantial care was exercised in the selection of 
these kernels to insure that none of them had any constructs that would unduly favor the 
Cray line. As much as possible, subroutines were selected that were merely straightforward 
Fortran code, intelligently coded with loops that are capable of being executed with vector 
operations, but otherwise neutral towards any particular machine. In fact, in the process 
of selecting these kernels for testing, it was discovered that some of them actually caused 
unforeseen difficulties for the Cray compiler. Nevertheless, they were left in the test suite 
to maintain objectivity. 

Performance is measured by the NAS Kernel Benchmark Program in MFLOPS (mil- 
lions of floating-point operations per second). The precise number of floating-point opera- 
tions for the various functions used in the test kernels is shown in Table 1. These numbers 
are based on actual counts of 64-bit floating-point operations in published algorithms. 

It should be noted that this program only measures MFLOPS rates. Disk I/O, operat- 
ing system efficiency, and other important factors of overall performance are not measured 
by this benchmark program. Also, several of the test subroutines perform a significant 
amount of memory move, integer, and logical operations, none of which is included in the 
floating-point operation count. 

The following is a description of the seven proposed Fortran test kernels. Other fea- 
tures are summarized in Table 2. 

1. MXM - This subroutine performs the usual matrix product on two input matrices. 
The subroutine employs a four-way unrolled, outer product matrix multiply algo- 
rithm that is especially effective for most vector computers. See [1] for a discussion 
of this algorithm. 

2. CFFT2D - This test performs a complex radix 2 FFT on a two dimensional in- 
put array, returning the result in place. The test kernel actually consists of two 
subroutines that perform FFTs along the first and second dimension of the array, 
respectively, taking advantage of the parallel structure of the array. See [2] for a 
discussion of the FFT algorithm used. 

3. CHOLSKY - This subroutine performs a Cholesky decomposition in parallel on 
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Table 1: Floating-point Operation Counts 


FIRST 


SECOND 

FLOATING 

ARGUMENT 

FUNCTION 

ARGUMENT 

PT. OPS. 

Real 

+ 

Real 

1 

Real 

- 

Real 

1 

Real 

* 

Real 

1 

1 

/ 

Real 

2 

Real 

l 

Real 

3 

Real 

** 

2 

1 

Real 

** 

Real 

45 

Complex 

* 

Real 

2 

Complex 

/ 

Real 

4 

1 

/ 

Complex 

7 

Real 

/ 

Complex 

9 

Complex 

+ 

Complex 

2 

Complex 

- 

Complex 

2 

Complex 

* 

Complex 

6 

Complex 

/ 

Complex 

13 

Real 

SQRT 


12 

Real 

EXP 


18 

Real 

LOG 


25 

Real 

SIN 


25 

Real 

ATAN 


25 

Complex 

ABS 


15 

Complex 

EXP 


70 


LOG 


65 








Table 2: Kernel Features 





KERNEL 



FEATURE 

1 

2 

3 

4 

5 

6 

7 

Two dimensional arrays 

X 

X 



X 

X 

X 

Multidimensional arrays 



X 

X 



X 

Dimensions with colons 



X 





Integer arrays 


X 



X 

X 


Integer functions in indices 





X 

X 


IF statements in inner loops 






X 


Scientific function calls 


X 

X 


X 

X 


Complex arithmetic 


X 



X 

X 


Complex function calls 





X 

X 


Inner loop memory strides 

1 

1 

1 

1 

1 

1 

128 



2 

4 

2 

2 





256 


750 

500 







900 




Inner loop vector lengths 

256 

128 

250 

28 

5 

100 

128 



256 



100 

500 







500 

1000 



a set of input matrices, which are actually input to the subroutine as a single 
three-dimensional array. 

4. BTRIX - This kernel performs a block tridiagonal matrix solution along one di- 
mension of a four dimensional array. 

5. GMTRY - This subroutine sets up arrays for a vortex method solution and per- 
forms Gaussian elimination on the resulting array. This kernel is noted for a number 
of loops that are challenging to vectorize. 

6. EMIT - Also extracted from a vortex code, this subroutine creates new vortices 
according to certain boundary conditions. 

7. VPENTA - This subroutine simultaneously inverts three matrix pentadiagonals in 
a highly parallel fashion. 

In each of the above test subroutines, the input data arrays are filled by a portable 
pseudorandom number generator in the calling program. This feature insures that all 
computers running the NAS Kernel Benchmark Program will perform the required calcu- 
lations on the same numbers. It also permits the output results to be checked for accuracy. 
Each of the seven tests is independent from the others - none depends on results calculated 
in a previous test program. Thus program alterations to improve the execution speed of 
one of the test kernels may be made without fear of affecting the other kernels. 
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GROUND RULES FOR PERFORMANCE TESTING 

Worlton’s recent article [3] pointed out some of the difficulties that are involved in 
supercomputer performance testing. Most of these problems are a result of the lack of 
well-defined controls on these tests. For instance, in some recent test results, one vendor 
was apparently allowed to perform some minor tuning and insertion of compiler directives, 
whereas the other was not. In other cases confusion has resulted from researchers not 
carefully noting exactly which version of a vendor’s compiler was being used in their tests. 
Some vendors have claimed amazingly high performance rates for their computers, which, 
upon closer analysis, have been achieved only by massive recoding of the test kernels 
and by the usage of assembly code. As a result of these difficulties, many of the recent 
comparisons of supercomputer performance have degenerated into shouting matches that 
have generated more heat than light. 

In consideration of such problems, some strict ground rules have been established for 
using the NAS Kernel Benchmark Program to evaluate supercomputer performance. Also, 
four levels of tests have been defined, so that the effects of varying amounts of tuning may 
be assessed. These different levels will also enable the NAS program to differentiate the 
performance of the hardware from that of the compiler. If the compiler is truly effective, 
then a relatively small amount of tuning should be sufficient to achieve close to the full 
potential of the hardware. The four test levels are defined as follows: 

1. Level 0 (“dusty deck”): For this test, the NAS Kernel Benchmark Program must be 
run without any changes to improve performance. If any alterations are required 
for compatibility purposes (for example, to define the timing function), they must 
be made by NAS program personnel. 

2. Level 20 (“minor tuning”): For this test, a few minor alterations may be made 
to the code to enhance performance. These changes may include, for example, 
compiler directives to assist the compiler’s vectorization analysis or changes to 
array dimensions to avoid disadvantageous memory strides. No more than 20 lines 
of code in the entire program file may be inserted or modified. 

3. Level 50 (“major tuning”): For this test, more extensive modifications may be made 
to the code to enhance performance. For example, some loops may be rewritten to 
avoid constructs that cause difficulties for the compiler or the hardware. A total 
of up to 50 lines of the program file may be inserted or modified for this test. 

4. Level 1000 (“customized code”): For this test, large scale coding changes are al- 
lowed to improve performance. Entire subroutines may be rewritten to avoid dif- 
ficult constructs. There is no limit to the number of lines of code that may be 
inserted or modified. 


For all four levels of tests, any modifications made to the program code must conform 
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to the ANSI Fortran-77 standard [4]. In particular, absolutely no assembly code will 
be allowed within the program file, and no external programs may be referenced other 
than the standard Fortran functions. Fortran subprograms may be referenced only if the 
Fortran code for the subprograms is included in the program file and conforms to the other 
requirements mentioned in this paper. Finally, no modification to the algorithms in the 
code may change the number of floating-point operations performed. 

The precision level of all floating-point data and operations in the program must be 
64 bits, with at least 47 mantissa bits. As a test of the hardware precision, and to ensure 
that any modifications made to the program file have not fundamentally changed the 
calculations being performed, an accuracy check is included with each of the seven tests. 
These checks are performed by comparing a selected result from each of the programs with 
a reference value stored in the program code and then computing the fractional error. The 
total of the fractional errors from the seven programs must be less than 5 x 10 -10 . 

The NAS Kernel Benchmark Program automatically calculates performance statistics 
and outputs this report on Fortran unit 6. This report includes the results of the accuracy 
checks, the number of floating-point operations performed, the CPU run times, and the 
resulting MFLOPS rates. The total error, total floating-point operation count, total CPU 
time, and the overall MFLOPS rate are also included. 

Normally only uniprocessor results are tabulated. If desired, multiprocessor perfor- 
mance may be estimated by simultaneously running the benchmark program on each of 
the individual processors. A multiprocessing performance figure may then computed by 
averaging the timings from the runs on the individual processors. Although no explicit 
multiprocessing is performed in this manner, such an exercise measures the amount of 
interprocessor resource contention, which is a significant factor in multiprocessing. In this 
way the performance increase that can be expected from multiple processor computation 
can be estimated without making the laborious modifications that are usually required to 
invoke true multiprocessing. 
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PROGRAM MASKER 


C 

C NAS KERNEL BENCHMARK PROGRAM 
C 12/17/84 DAVID H BAILEY 
C 

CHARACTERS PN(8) 

REAL ER(8>. FP(8>, TM(8), RT(8> 

COMMON /ARRAYS/ OATA(3G0000) 

DATA PN/’MXM* , ’CFFT20’, ’CHOLSKY’, ’BTRIX’, ’GMTRY*, 'EMIT* , 
t ’VPENTA’, ’TOTAL*/ 

C 

WRITE (G. 1) 

1 FORMAT (/16X. ’THE NAS KERNEL BENCHMARK PROGRAM’//) 

C 

CALL MXMTST (ERU) , FPU), TM(1) ) 

CALL FFTTST (ER(2>, FP(2), TMI2) ) 

CALL CHOTST (ER(3), FP(3), TMI3)) 

CALL BTRTST (ER(4), FP(4). TM(4)) 

CALL GMTTST (ER(5), FP(5), TM(5)) 

CALL EMITST (ER(6), FP(6), TM(G)) 

CALL VPETST (ER(7) • FP<7>, TM(7)) 

C 

TE - 0. 

TF - 0. 

TT - 0. 

DO 100 I - 1, 7 
TE - TE ♦ ERU) 

TF - TF ♦ FP(I) 

TT - TT ♦ TM (II 

RT(I) - IE-6 « FP(I) / TM Cl > 

100 CONTINUE 
ER(8) - TE 
FP(8) - TF 
TM(8) - TT 

RT (8) - IE-6 * TF / TT 
C 

WRITE (G, 2) (PN(I). ERU). FPU), TM ( I ) . RTII). I . 1, 8) 

2 FORMAT <’ PROGRAM’, 8X. ’ERROR’, 10X. ’FP OPS’, 7X, ’SECONDS’, 
t GX, ’MFLOPS’// 7 tlX, A8. 1P2E1S.4, 0PF12.4, F12.2/)/ 

$ IX, A8, 1P2E15.4. 0PF12.4, F12.2//) 

C 


C 

FUNCTION CPTIME O 
C 

C RETURNS THE CPU TIME SINCE THE LAST CALL TO CPTIME. 

C THIS SUBPROGRAM MAY BE CHANGED AS NEEDED FOR A PARTICULAR COMPUTER 

C SYSTEM WITHOUT PENALTY, PROVIOEO IT PERFORMS THIS FUNCTION. 

C 

DATA TX/0./ 

T - SECOND 0 
CP Tire - T - TX 
TX - T 
RETURN 
END 
C 

SUBROUTINE COPY (N. A. B) 

C 

C ARRAY COPY ROUTINE 
C 

REAL AIN), BIN) 

DO 100 I - 1, N 
BID • Alt) 

100 CONTINUE 
RETURN 
END 
C 

SUBROUTINE MXMTST (ER. FP. TM) 

C 

C FLOATING-POINT MATRIX MULTIPLY TEST 
C 

PARATETER (L-2S6, M-128, N-64, F7-7812S. , T30-1 073741 824. ) 


COMMON /ARRAYS/ AIL.M), SI. BIM.N) , S2. C(L.N) 

DATA IT/1C0/, ANS/35. 202G 179738722/ 

C 

C INITIALIZATION 
C 

C THE ARRAYS A AND B ARE FILLED WITH PSEUDO-RANDOM (0., 1.) DATA 
C USING A RANDOM NUMBER GENERATOR BASED ON THE RECURSION 
C XIN+l) - S**7 * X IN) (MOO 2**30) 

C THIS RECURSION WILL GENERATE 2**28 IAPPROX. 2G8 MILLION) NUMBERS 
C BEFORE REPEATING. FOR THIS SCHEME TO WORK PROPERLY. THE HARDWARE 

C MULTIPLY OPERATION MUST BE CORRECT TO 47 BITS OF PRECISION. 

C THIS SAME SCHEME IS USED TO INITIALIZE OATA ARRAYS FOR ALL TESTS. 

C 

T - F7 / T30 
DO 103 J - 1. M 
DO 100 I - 1, L 

T - MOD (F7 * T, 1.) 

AU.JI - T 
100 CONTINUE 

DO 110 J - 1. N 
DO 110 I - 1. M 
T - MOO (F7 * T. 1.) 

BU , J) - T 
110 CONTINUE 

TM - CPTIME 0 
C 

C TIMING TEST 
C 

DO 120 II - 1, IT 
CALL MXM (A, B, C, L. M. N) 

120 CONTINUE 
C 

TM - CPTIME 0 

ER - ABS ( (C(19, 19) - ANS) / ANSI 
FP - 2. * IT * L * M * N 
C 

RETURN 

END O 

C I 

SUBROUTINE MXM tA, B, C, L. M, N) 

DIMENSION A(L.M), BIM.N), CIL.N) 

C 

C 4-WAY UNROLLED MATRIX MULTIPLY ROUTINE FOR VECTOR COMPUTERS. 

C M MUST BE A MULTIPLE OF 4. CONTIGUOUS OATA ASSUMED. 

C OH BAILEY 11/15/84 
C 

DO 100 K - 1. N 
DO 100 I - 1, L 
CU.K) - 0. 

100 CONTINUE 

00 110 J - 1. M, 4 
DO 110 K - 1. N 
DO 110 I - 1. L 

CU.K) - CU.K) ♦ AU.J) * B(J.K) 
t ♦ A (I , J+l) * B(J+1,K) ♦ AU , J+2) » BIJ+2.K) 

t AU , J+3) * BIJ+3.K) 

110 CONTINUE 
C 

RETURN 

END 

C 

SUBROUTINE FFTTST (ER. FP, TM) 

.C 

C 2-0 FFT TEST PROGRAM 
C 

PARAMETER (M-128. N-2SS, Ml-128, F7-7812S., T30-1 073741 824. ) 

COMPLEX X, Y, CT 

COMMON /ARRAYS/ X(M1,N), WKM), W2(N), IP(2*N) 

OATA IT/100/, ANS/0. 834799941219277/ 

C 

C INITIALIZE 
C 


AMN • M « N 



RMN - 1 . / AMN 
T2 - F7 / T30 
DO 100 J - 1, N 

do lea i - l. n 

T1 * HOD (F7 * T2. 1.) 

T2 - nOO <F7 * Tl, 1.1 

X(I.J) - CMPLX (Tl. T2) 

100 CONTINUE 

CALL CFFT2D1 (0. 0. HI, N, X, 111, IP) 

CALL CFFT202 (0. 0, Ml. N. X, U2. IP) 

TH - CPTIME () 

‘ c 

C TEST ITERATIONS 
C 

DO 120 PC - 1. IT 
DO 110 J - 1. N 
00 110 1-1. ft 

X(I.J) - RTW • X(I,J) 

110 CONTINUE 
C 

CALL CFFT201 (1, 0. HI. N, X, Ul, IP) 

CALL CFFT2D2 tl. 0, Ml, N. X. U2, IP) 

CALL CFFT202 (-1. fl. Ml. N. X, U2. IP) 

CALL CFFT201 (-1. 0, 01. N. X, Ul, IP) 

120 CONTINUE 
C 

TO - CP TIME t) 

ER - ABS ( (REAL (X (19, 19) ) - ANS) / ANS) 

FP - IT * AON * (2. + 10. » LX (AONP/LX (2.)) 

C 

RETURN 

END 

C 

SUBROUTINE CFFT2D1 (IS. 0. 01. N. X, U, IP) 

C 

C PERFORMS COMPLEX RADIX 2 FFTS ON THE FIRST DIMENSION OF THE 2-D ARRAY X 
C OH BAILEY 11/15/84 
C 

COMPLEX X(Ol.N). U(0), CT, CX 
INTEGER IP (2,0) 

DATA PI /3. 141532853589793/ 

C 

C IF IS - 0 THEN INITIALIZE ONLY 
C 

02 - 0 / 2 
IF (IS .EQ. 0) THEN 
TO 100 I - 1, 02 

T - 2. « PI » (1-1) / 0 

14(11 - COPLX (COS (T), SIN (T) ) 

100 CONTINUE 
RETURN 
END IF 
C 

C PERFORM FORWARD OR BACKUARO FFTS ACCORDING TO IS - 1 OR -1 
C 

00 110 I - 1, 0 
IP(l.I) - I 
110 CONTINUE 
L - 1 
11 - 1 
C 

120 12-3-11 

X 130 J - L, 02, L 
CX - U(J-L+1) 

IF (IS .LT. 0) CX - CONJG (CX) 

X 130 I - J-L+l, J 
II - IP(Il.I) 

IP (12, I+J-L) - II 
10 - IP (II, 1+02) 

IPU2.I+J) - 10 
X 130 K - 1, N 

CT - XMl ,K) - X (10,10 
X(II.K) - X ( 1 1 ,K) 4 XUO.iO 







c 

Trt - CPTIME « 

ER - ABS ( (S(13,19,19,l) - ANS) / ANS) 

FP - IT » MD * (LE - 11 » 19165. 

C 

RETURN 

END 

C 

SUBROUTINE BTRIX (JS. JE, LS. LE, K) 

C 

C VECTORIZED BLOCK TRI-DIAGONAL SOLVER IN THE J DIRECTION 
C FOR K - CONSTANT PLANES 

C 

C 11/15/84 D H BAILEY nOOIFIED FOR NAS KERNEL TEST 
C 

PARAMETER (JO-33, KO-30. LD-30, MO-30) 

COmON /ARRAYS/ S(X,KO,LO,S>, A(5,S,M0,MD), B(S,5,M0.n0) , 
t c(S,5,m.no) 

C 


DIMENSION 

U12(MD), 

U13(MD) , 

U14(MD) . 

U15 (MD) , 

U23 (MO) 

t 

U24 (MD), 

U25 (MD) , 

U34 (MD) , 

U35 (MD) , 

U45 (MD) 

REAL 

L1KMO). 

L21 (MD). 

L31(MD). L41(MD). 

L51 (MD) 

f 

L22(MD) , 

L32 (MD) , 

L42 (MD) . 

L52IM0) , 

L33 (MD) 

1 

L43 (MD), 

L53 (MO) , 

L44 (MD) . 

L54 (MO) , 

L55 (MD) 


c 

c PART 1. FDRUARO BLOCK SWEEP 

C 

C 

DO 100 J - JS.JE 
C 

STEP 1. CONSTRUCT L(I> IN B 

C 

IF(J.EQ.JS) GO TO 4 
DO 3 M - 1.5 
DO 3 N - 1,5 
DO 3 L - LS.LE 

BIM.N.J.L) - B(M,N, J,L) - A(M,1, J,L)*S (l.N, J-1,L) 
f - A(M,2, J,L) *8(2, N, J-l.L) - AIM, 3. J,L)*8(3.N, J-l.L) 

t - A(M,4,J,L)*B(4,N.J-1,L) - A(M,5.J.L)*B(S,N.J-1.L) 

3 CONTINUE 
C 

4 CONTINUE 
C 

C ********** STEP 2. COWJTE L INVERSE 

C 

C 

C A. DECOMPOSE L(I) INTO L AND U 

C 

DO 20 L - LS.LE 
LU (LI - 1. / Btl.l.J.L) 

U12CL) - B(l,2, J,L)*L11 (LI 
U13 (L) - B (1,3, J,L)*L11 (L) 

U14(L) - B(1,4,J,L)*L11(L) 

U15CL) - B(l,5, J,L)«L11 CL) 

L21 (L) - B(2,l, J,L) 

L22(L) - 1. / (B(2,2. J.L) - L21 (L)*U12(L)) 

U23(L) - (8(2, 3. J.L) - L21 (L)*U13(L) ) * L22(L) 

U24 (L) - (B (2,4, J.L) - L21 (L)*U14 (L) ) *L22(L) 

U25(L) - (B (2, 5, J.L) - L21 (L)*U15(L) ) * L22(L) 

L31(L) - B(3,1,J,L) 

L32CL) - B(3.2,J,L) - L31 (L)*U12(L) 

L33(L) - 1. / (B(3,3, J.L) - L31(L)*U13(L) - L32 (L)*U23 (L) ) 

U34 (L) - (8(3,4, J.L) - L31 (L)*U14 (L) - L32(L)*024 (L) ) * L33(L) 
U3S(L) - (B(3,5, J.L) - L31 (L)*U15(L) - L32(L)*U2S(L) ) * L33(L) 
20 CONTINUE 
C 

DO 25 L - LS.LE 
L41(L) - B (4, 1 , J.L) 

L42(L) - B(4,2,J,L) - L41 (L)*U12 (L) 

L43(L) - B(4,3,J.L) - L41 (L)*U13<L) - L42(L)*U23(L) 

L44 (L) - 1. / (B (4,4, J.L) - L41 (L)*U14(L) - L42(L)*U24(L) 

f - L43(L)*U34 (L) ) 




t - U14(L)*B(4,N, J,L) - U15(L)*CS 

40 CONTINUE 
C 
C 

100 CONTINUE 
C 

C PART 2. BACKUARD BLOCK SUEEP 
C 

JEM1 - X - 1 
C 

DO 200 J - JEM1,JS.-1 
DO 200 n - 1,5 
DO 200 L - LS.LE 

SU.K.L.M) - SU.K.L.M) - B (M,l, J,L)*S(J+1,K,L,1) 
t - B(N,2, J,L)*5(J+1,K,L,2) - B(M,3, J,L)*S (J+1.K.L.3) 

S - B{M,4,J,L)*SU+1.K,L,4) - B(H,5, J,L)*S(J+1,K,L.5) 

200 CONTINUE 
C 

RETURN 

END 

C 

SUBROUTINE GMTTST (ER. FP, TM) 

C 

PARAMETER (NU-100, NB-5, F7-7812S. . T30-1 073741 824. ) 

COMPLEX UALL. ZCR, PROJ. ZI, 21, ZZ 

COMMON /ARRAYS/ NUALL(NB) , UALL (NU.NB), RMATRX(NU*#fi,NU*NB) , 
t ZCR ( NU.NB) , PROJ (NU.NB) , XMAX(NB) 

DATA IT/2/, ANS/-2. 57754233214174/ 

C 

C INITIALIZATION 
C 

LU - 2 • NU « NB 
T2 - F7 / T30 
DO 100 J - 1, NB 
NUALL(J) - NU 
180 CONTINUE 

DO 110 J - 1, NB 
DO 110 I - 1, NU 

T1 - MOO (F7 « T2. l.J 
T2 - MOO (F7 * Tl, 1.) 

UALLU.J) - CMPLX (Tl, T2> 

110 CONTINUE 

TM - CPTIME O 
C 

C TIMING TEST 
C 

DO 120 I - 1, IT 
CALL GMTRY 
120 CONTINUE 
C 

TM - CPTIME 0 

ER - ABS ( (RMATRX (19, IB) - ANS) / ANS) 

FP - IT « (120. • (NB*NU) ** 2 ♦ 0.886 « (N8«NU) n 3) 

C 

RETURN 

END 

C 

SUBROUTINE GMTRY 
C 

C COMPUTE SOLID-RELATED ARRAYS. GAUSS ELIMINATE TX MATRIX OF UALL 
C IfFLUENCE COEFFICIENTS. 

C 

C 11/30/84 0 H BAILEY REVISED COOE FOR NAS KERNEL TEST 
C 

PARAMETER (NU-180, NB-5) 

COMPLEX UALL. ZCR, PROJ, ZI, Zl. ZZ 

COmON /ARRAYS/ NUALL(NB). UALL(NU,IC), RMATRX(NU*N8,NU*NB) , 
t ZCR(U).NB) , PROJ (NU.NB) , XMAX(NB) 

C 

DATA ARCL /5./. PI /3.1415926S3S89733/, PER I 00/3./ 

C 

C COTFUTE ARCLENGTH. 

C 


C 
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